Data Detective: Excel & Gretl Learning Game

Welcome to Data Detective!

You've been hired as a Data Detective at the Higher Education Research Institute to analyze training and development data from polytechnic and higher education institutions.

Your mission: Use data analysis to uncover insights that will improve faculty training programs and institutional performance.

This interactive game will teach you how to:

  • Clean and prepare data for analysis
  • Create and interpret correlation matrices
  • Perform regression analysis and understand R-squared
  • Identify and address autocorrelation in time series data
  • Visualize data and present findings

No prior experience with Excel or Gretl is required!

Module 1: Data Detective Academy

Your first assignment is to clean a messy dataset of faculty training records.

Data Cleaning Basics

Before analyzing data, we need to ensure it's clean and properly formatted. Common data issues include:

  • Missing values
  • Duplicate entries
  • Inconsistent formatting
  • Outliers
Faculty_Training.xlsx
File Home Insert Data Review

Your Task:

Identify and fix the following issues in the dataset:

  1. Find and remove duplicate entries
  2. Identify missing values and decide how to handle them
  3. Check for outliers in the Training_Hours column

Knowledge Check (2.5%)

1. Which of the following is NOT a common data cleaning task?

2. When dealing with outliers in a dataset, which approach is generally NOT recommended?

3. Which Excel function would you use to find the average of a range of cells?

Module 2: The Case of the Missing Correlations

The institute needs to understand relationships between faculty training hours and student satisfaction scores.

Understanding Correlation

Correlation measures the strength and direction of the relationship between two variables:

  • Correlation coefficient ranges from -1 to +1
  • +1 indicates a perfect positive correlation
  • -1 indicates a perfect negative correlation
  • 0 indicates no linear correlation

Strong Positive Correlation

r ≈ +0.9

No Correlation

r ≈ 0

Strong Negative Correlation

r ≈ -0.9

Gretl: Correlation Matrix

Your Task:

Analyze the correlation matrix to identify significant relationships between variables:

  1. Identify the strongest positive correlation
  2. Identify the strongest negative correlation
  3. Determine which training type has the strongest relationship with student satisfaction

Knowledge Check (2.5%)

1. A correlation coefficient of -0.8 indicates:

2. Which statement about correlation is FALSE?

3. In a correlation matrix, what do the values along the diagonal represent?

Module 3: Regression Investigation

Predict future training needs based on historical data and faculty performance metrics.

Understanding Regression Analysis

Regression analysis helps us understand how changes in independent variables affect a dependent variable:

  • Simple regression: One independent variable
  • Multiple regression: Two or more independent variables
  • R-squared: Measures how well the model explains variation in the dependent variable
Gretl: Regression Output

Understanding R-squared

R-squared (R²) measures the proportion of variance in the dependent variable explained by the independent variables:

Total Variance
Explained by Model
Unexplained
R² = 0.00

Your Task:

Analyze the regression output to determine which factors most influence student satisfaction:

  1. Identify which training type has the strongest effect on student satisfaction
  2. Interpret the R-squared value
  3. Determine if the overall model is statistically significant

Knowledge Check (2.5%)

1. An R-squared value of 0.75 means:

2. In a regression output, a p-value of 0.03 for a coefficient indicates:

3. A negative coefficient in a regression model means:

Module 4: The Time Series Mystery

Analyze 5 years of faculty development data to identify trends and seasonal patterns.

Understanding Time Series Analysis

Time series data has special characteristics that require specific analytical approaches:

  • Trends: Long-term movements in the data
  • Seasonality: Regular patterns that repeat at fixed intervals
  • Autocorrelation: When observations are related to previous observations

Understanding Autocorrelation

Autocorrelation occurs when the error terms in a regression model are correlated over time:

  • Durbin-Watson statistic: Measures autocorrelation (values near 2 indicate no autocorrelation)
  • Positive autocorrelation: DW < 2 (common in time series data)
  • Negative autocorrelation: DW > 2 (less common)
0.27
Positive Autocorrelation
No Autocorrelation
Negative Autocorrelation
0 1 2 3 4
Gretl: Time Series Regression

Your Task:

Analyze the time series data and address autocorrelation issues:

  1. Identify if autocorrelation is present in the model
  2. Apply a correction method (add lagged variables or use first differences)
  3. Compare the original and corrected models

Knowledge Check (2.5%)

1. A Durbin-Watson statistic of 0.27 indicates:

2. Which of the following is NOT a method to address autocorrelation?

3. Autocorrelation in a regression model can lead to:

Module 5: The Final Report

Compile findings into a comprehensive report for the institute's board.

Data Visualization and Reporting

Effective data visualization and reporting are crucial for communicating your findings:

  • Choose appropriate chart types for different data relationships
  • Create clear, labeled visualizations
  • Interpret results in context
  • Provide actionable recommendations

Training Hours vs. Performance

Training Type Distribution

Training Effectiveness Over Time

Faculty Performance by Experience

Report Structure

1. Executive Summary

Brief overview of key findings and recommendations

2. Data Analysis

Detailed analysis of faculty training data

3. Correlation Findings

Key relationships between variables

4. Regression Results

Factors influencing student satisfaction and faculty performance

5. Time Series Trends

Patterns and projections over time

6. Recommendations

Evidence-based suggestions for training programs

Your Task:

Create a comprehensive report based on your findings from the previous modules:

  1. Select appropriate visualizations for key findings
  2. Interpret the results in the context of faculty development
  3. Provide actionable recommendations for improving training programs

Knowledge Check (2.5%)

1. Which chart type is best for showing the relationship between two continuous variables?

2. When reporting regression results, which of the following should NOT be included?

3. Which of the following is an example of an actionable recommendation based on data analysis?

Congratulations, Data Detective!

Certificate of Completion
Data Detective: Excel & Gretl Learning Game
This certifies that you have successfully completed all modules and demonstrated proficiency in data analysis using Excel and Gretl.

Skills Acquired:

  • Data Cleaning and Preparation
  • Correlation Analysis
  • Regression Analysis and R-squared Interpretation
  • Time Series Analysis and Autocorrelation
  • Data Visualization and Reporting
Final Score: 0%

Next Steps

To continue your learning journey:

  • Try the Excel-based version of this game for hands-on practice
  • Apply these skills to your own research data
  • Explore advanced statistical techniques in Gretl